3,485 research outputs found

    Controlling Network Latency in Mixed Hadoop Clusters: Do We Need Active Queue Management?

    Get PDF
    With the advent of big data, data center applications are processing vast amounts of unstructured and semi-structured data, in parallel on large clusters, across hundreds to thousands of nodes. The highest performance for these batch big data workloads is achieved using expensive network equipment with large buffers, which accommodate bursts in network traffic and allocate bandwidth fairly even when the network is congested. Throughput-sensitive big data applications are, however, often executed in the same data center as latency-sensitive workloads. For both workloads to be supported well, the network must provide both maximum throughput and low latency. Progress has been made in this direction, as modern network switches support Active Queue Management (AQM) and Explicit Congestion Notifications (ECN), both mechanisms to control the level of queue occupancy, reducing the total network latency. This paper is the first study of the effect of Active Queue Management on both throughput and latency, in the context of Hadoop and the MapReduce programming model. We give a quantitative comparison of four different approaches for controlling buffer occupancy and latency: RED and CoDel, both standalone and also combined with ECN and DCTCP network protocol, and identify the AQM configurations that maintain Hadoop execution time gains from larger buffers within 5%, while reducing network packet latency caused by bufferbloat by up to 85%. Finally, we provide recommendations to administrators of Hadoop clusters as to how to improve latency without degrading the throughput of batch big data workloads.The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007–2013) under grant agreement number 610456 (Euroserver). The research was also supported by the Ministry of Economy and Competitiveness of Spain under the contracts TIN2012-34557 and TIN2015-65316-P, Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), HiPEAC-3 Network of Excellence (ICT- 287759), and the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.Peer ReviewedPostprint (author's final draft

    Interconnect Energy Savings and Lower Latency Networks in Hadoop Clusters: The Missing Link

    Get PDF
    An important challenge of modern data centres running Hadoop workloads is to minimise energy consumption, a significant proportion of which is due to the network. Significant network savings are already possible using Energy Efficient Ethernet, supported by a large number of NICs and switches, but recent work has demonstrated that the packet coalescing settings must be carefully configured to avoid a substantial loss in performance. Meanwhile, Hadoop is evolving from its original batch concept to become a more iterative type of framework. Other recent work attempts to reduce Hadoop's network latency using Explicit Congestion Notifications. Linking these studies reveals that, surprisingly, even when packet coalescing does not hurt performance, it can degrade network latency much more than previously thought. This paper is the first to analyze the impact of packet coalescing in the context of network latency. We investigate how to design and configure interconnects to provide the maximum energy savings without degrading cluster throughput performance or network latency.The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007–2013) under grant agreement number 610456 (Euroserver). The research was also supported by the Ministry of Economy and Competitiveness of Spain under the contracts TIN2012-34557 and TIN2015-65316-P, Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), HiPEAC-3 Network of Excellence (ICT- 287759), and the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.Peer ReviewedPostprint (author's final draft

    Running stream-like programs on heterogeneous multi-core systems

    Get PDF
    All major semiconductor companies are now shipping multi-cores. Phones, PCs, laptops, and mobile internet devices will all require software that can make effective use of these cores. Writing high-performance parallel software is difficult, time-consuming and error prone, increasing both time-to-market and cost. Software outlives hardware; it typically takes longer to develop new software than hardware, and legacy software tends to survive for a long time, during which the number of cores per system will increase. Development and maintenance productivity will be improved if parallelism and technical details are managed by the machine, while the programmer reasons about the application as a whole. Parallel software should be written using domain-specific high-level languages or extensions. These languages reveal implicit parallelism, which would be obscured by a sequential language such as C. When memory allocation and program control are managed by the compiler, the program's structure and data layout can be safely and reliably modified by high-level compiler transformations. One important application domain contains so-called stream programs, which are structured as independent kernels interacting only through one-way channels, called streams. Stream programming is not applicable to all programs, but it arises naturally in audio and video encode and decode, 3D graphics, and digital signal processing. This representation enables high-level transformations, including kernel unrolling and kernel fusion. This thesis develops new compiler and run-time techniques for stream programming. The first part of the thesis is concerned with a statically scheduled stream compiler. It introduces a new static partitioning algorithm, which determines which kernels should be fused, in order to balance the loads on the processors and interconnects. A good partitioning algorithm is crucial if the compiler is to produce efficient code. The algorithm also takes account of downstream compiler passes---specifically software pipelining and buffer allocation---and it models the compiler's ability to fuse kernels. The latter is important because the compiler may not be able to fuse arbitrary collections of kernels. This thesis also introduces a static queue sizing algorithm. This algorithm is important when memory is distributed, especially when local stores are small. The algorithm takes account of latencies and variations in computation time, and is constrained by the sizes of the local memories. The second part of this thesis is concerned with dynamic scheduling of stream programs. First, it investigates the performance of known online, non-preemptive, non-clairvoyant dynamic schedulers. Second, it proposes two dynamic schedulers for stream programs. The first is specifically for one-dimensional stream programs. The second is more general: it does not need to be told the stream graph, but it has slightly larger overhead. This thesis also introduces some support tools related to stream programming. StarssCheck is a debugging tool, based on Valgrind, for the StarSs task-parallel programming language. It generates a warning whenever the program's behaviour contradicts a pragma annotation. Such behaviour could otherwise lead to exceptions or race conditions. StreamIt to OmpSs is a tool to convert a streaming program in the StreamIt language into a dynamically scheduled task based program using StarSs.Totes les empreses de semiconductors produeixen actualment multi-cores. Mòbils,PCs, portàtils, i dispositius mòbils d’Internet necessitaran programari quefaci servir eficientment aquests cores. Escriure programari paral·lel d’altrendiment és difícil, laboriós i propens a errors, incrementant tant el tempsde llançament al mercat com el cost. El programari té una vida més llarga queel maquinari; típicament pren més temps desenvolupar nou programi que noumaquinari, i el programari ja existent pot perdurar molt temps, durant el qualel nombre de cores dels sistemes incrementarà. La productivitat dedesenvolupament i manteniment millorarà si el paral·lelisme i els detallstècnics són gestionats per la màquina, mentre el programador raona sobre elconjunt de l’aplicació.El programari paral·lel hauria de ser escrit en llenguatges específics deldomini. Aquests llenguatges extrauen paral·lelisme implícit, el qual és ocultatper un llenguatge seqüencial com C. Quan l’assignació de memòria i lesestructures de control són gestionades pel compilador, l’estructura iorganització de dades del programi poden ser modificades de manera segura ifiable per les transformacions d’alt nivell del compilador.Un dels dominis de l’aplicació importants és el que consta dels programes destream; aquest programes són estructurats com a nuclis independents queinteractuen només a través de canals d’un sol sentit, anomenats streams. Laprogramació de streams no és aplicable a tots els programes, però sorgeix deforma natural en la codificació i descodificació d’àudio i vídeo, gràfics 3D, iprocessament de senyals digitals. Aquesta representació permet transformacionsd’alt nivell, fins i tot descomposició i fusió de nucli.Aquesta tesi desenvolupa noves tècniques de compilació i sistemes en tempsd’execució per a programació de streams. La primera part d’aquesta tesi esfocalitza amb un compilador de streams de planificació estàtica. Presenta unnou algorisme de partició estàtica, que determina quins nuclis han de serfusionats, per tal d’equilibrar la càrrega en els processadors i en lesinterconnexions. Un bon algorisme de particionat és fonamental per tal de queel compilador produeixi codi eficient. L’algorisme també té en compte elspassos de compilació subseqüents---específicament software pipelining il’arranjament de buffers---i modela la capacitat del compilador per fusionarnuclis. Aquesta tesi també presenta un algorisme estàtic de redimensionament de cues.Aquest algorisme és important quan la memòria és distribuïda, especialment quanles memòries locals són petites. L’algorisme té en compte latències ivariacions en els temps de càlcul, i considera el límit imposat per la mida deles memòries locals.La segona part d’aquesta tesi es centralitza en la planificació dinàmica deprogrames de streams. En primer lloc, investiga el rendiment dels planificadorsdinàmics online, non-preemptive i non-clairvoyant. En segon lloc, proposa dosplanificadors dinàmics per programes de stream. El primer és específicament pera programes de streams unidimensionals. El segon és més general: no necessitael graf de streams, però els overheads són una mica més grans.Aquesta tesi també presenta un conjunt d’eines de suport relacionades amb laprogramació de streams. StarssCheck és una eina de depuració, que és basa enValgrind, per StarSs, un llenguatge de programació paral·lela basat en tasques.Aquesta eina genera un avís cada vegada que el comportament del programa estàen contradicció amb una anotació pragma. Aquest comportament d’una altra manerapodria causar excepcions o situacions de competició. StreamIt to OmpSs és unaeina per convertir un programa de streams codificat en el llenguatge StreamIt aun programa de tasques en StarSs planificat de forma dinàmica.Postprint (published version

    Harmonic Riemannian manifolds

    Get PDF
    In this thesis work is described that arose out of a study of harmonic Riemannian manifolds. A definition of harmonicity is given and from this it is shown how the Ledger conditions on the curvature of a harmonic manifold may be derived in principle and the first four are written down. The first three Ledger conditions are put into local co-ordinate form and simpler conditions are derived, the most important being the super-Einstein condition. The idea of the Schur property is also introduced. The mean-value work of Gray and Willmore is described and extended as far as the r(^8) term under some simplifying conditions. Finally there is an investigation of the extent to which the compact classical simple Lie groups with bi-invariant metrics can satisfy Ledger’s first three conditions

    Active filter current compensation for transmission optimisation

    Get PDF
    This dissertation is based on the fact that any m-wire electrical system can be modelled as m-equivalent Thevenin voltages and impedances when viewed from any node. The dissertation describes how to calculate the optimal distribution of currents, so a specific amount of power can flow through and reach the network equivalent Thevenin voltages with minimal losses. The optimal current distribution method uses a recently patented method which calculates the optimal currents for each of the wires which are shown to be obtained from the Thevenin parameters and power flow at any instant in time at any node. Once the ideal currents are found, these can be obtained by active and passive devices to inject a specific amount of power (positive and negative) as to compensate existing currents. The focus is particularly on the proof of concept by simulations and physical experiments with work not specifically described in the patent with more emphasis on the optimisation to active compensation. It is explained and shown how this can be implemented using the Malengret and Gaunt method. This method reduces the cost in application where not all the currents need to be processed through a converter (e.g. inverter) but only the difference between the existing and desired optimal currents. A smaller shunt parallel converter can result with ideal current flow without the need for interrupting the currents as described in the present patent. The methodology is explained and demonstrated by simulation

    Passive Energy Management through Increased Thermal Capacitance

    Get PDF
    Energy usage within the world is increasing at a drastic rate. Buildings currently consume a major amount of the total energy used within the United States, and most of this energy usage supports heating and cooling. This demand shows that new passive energy management systems are needed. The use of Increased Thermal Capacitance (ITC) is proposed as a new passive energy management system. To increase thermal capacitance, a piping system is either added into a building’s walls or ceiling. In this paper, a building with ITC added is compared to a similar building without ITC using the simulation program TRNSYS. Along with a comparison between the walls and ceiling, several parameters are analyzed for their effect on the performance of the ITC. ITC was found to be effective especially when located in the ceiling, with the location, specific heat and tank size being the most important factors

    Deforestation in Nineteenth-Century Maine: The Record of Henry David Thoreau

    Get PDF
    Thoreau’s Maine Woods, a record of three trips made between 1846 and 1857, offers a combination of literary metaphor and precise botanical and topographical observation. Comparing Thoreau’s journals with recent advances in forest ecology, author Geoffrey Paul Carpenter reveals a detailed picture of the various ways in which logging activity changed the forests, lakes, and rivers of Maine. Carpenter demonstrates that a precise understanding of forest history depends not only on traditional statistical sources, but also on the subjective personal testimony found in the literary record

    Passive Energy Management through Increased Thermal Capacitance

    Get PDF
    Energy usage within the world is increasing at a drastic rate. Buildings currently consume a major amount of the total energy used within the United States, and most of this energy usage supports heating and cooling. This demand shows that new passive energy management systems are needed. The use of Increased Thermal Capacitance (ITC) is proposed as a new passive energy management system. To increase thermal capacitance, a piping system is either added into a building’s walls or ceiling. In this paper, a building with ITC added is compared to a similar building without ITC using the simulation program TRNSYS. Along with a comparison between the walls and ceiling, several parameters are analyzed for their effect on the performance of the ITC. ITC was found to be effective especially when located in the ceiling, with the location, specific heat and tank size being the most important factors

    Rank-normalization, folding, and localization: An improved R^\widehat{R} for assessing convergence of MCMC

    Full text link
    Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic R^\widehat{R} of Gelman and Rubin (1992) has serious flaws. Traditional R^\widehat{R} will fail to correctly diagnose convergence failures when the chain has a heavy tail or when the variance varies across the chains. In this paper we propose an alternative rank-based diagnostic that fixes these problems. We also introduce a collection of quantile-based local efficiency measures, along with a practical approach for computing Monte Carlo error estimates for quantiles. We suggest that common trace plots should be replaced with rank plots from multiple chains. Finally, we give recommendations for how these methods should be used in practice.Comment: Minor revision for improved clarit
    • …
    corecore